Predicting speech intelligibility in conditions with nonlinearly processed noisy speech
نویسندگان
چکیده
The speech-based envelope power spectrum model (sEPSM; [1]) was proposed in order to overcome the limitations of the classical speech transmission index (STI) and speech intelligibility index (SII). The sEPSM applies the signal-tonoise ratio in the envelope domain (SNRenv), which was demonstrated to successfully predict speech intelligibility in conditions with nonlinearly processed noisy speech, such as processing with spectral subtraction. Moreover, a multiresolution version (mr-sEPSM) was demonstrated to account for speech intelligibility in various conditions with stationary and fluctuating interferers [2]. However, the model fails in the case of phase jitter distortion, in which the spectral structure of speech is affected but the temporal envelope is maintained. This suggests that an across audio-frequency mechanism is required to account for this distortion. It is demonstrated that a measure of the across audio-frequency variance at the output of the modulation-frequency selective process in the model is sufficient to account for the phase jitter distortion. Thus, a joint spectro-temporal modulation analysis, as proposed in [3], does not seem to be required. The results are consistent with concepts from computational auditory scene analysis and further support the hypothesis that the SNRenv is a powerful metric for speech intelligibility prediction.
منابع مشابه
Prediction of intelligibility of noisy and time-frequency weighted speech based on mutual information between amplitude envelopes
This paper deals with the problem of predicting the average intelligibility of noisy and potentially processed speech signals, as observed by a group of normal hearing listeners. We propose a prediction model based on the hypothesis that intelligibility is monotonically related to the the amount of Shannon information the critical-band amplitude envelopes of the noisy/processed signal convey ab...
متن کاملSNR loss: A new objective measure for predicting the intelligibility of noise-suppressed speech
Most of the existing intelligibility measures do not account for the distortions present in processed speech, such as those introduced by speech-enhancement algorithms. In the present study, we propose three new objective measures that can be used for prediction of intelligibility of processed (e.g., via an enhancement algorithm) speech in noisy conditions. All three measures use a critical-ban...
متن کاملThe characterization of the relative information content by spectral features for the objective intelligibility assessment of nonlinearly processed speech
The objective intelligibility assessment of nonlinearly enhanced speech is a widely experienced problem. Nonlinear speech enhancement processors operate primarily on the low-level and transient components of speech. As these sections contain important acoustic cues as well as context-constitutive information, they dominate speech intelligibility. For that reason, shorttime intelligibility measu...
متن کاملBlind Non-Intrusive Speech Intelligibility Prediction Using Twin-HMMs
Automatic prediction of speech intelligibility is highly desirable in the speech research community, since listening tests are timeconsuming and can not be used online. Most of the available objective speech intelligibility measures are intrusive methods, as they require a clean reference signal in addition to the corresponding noisy/processed signal at hand. In order to overcome the problem of...
متن کاملEvaluation of Objective Intelligibility Prediction Measures for Speech Enhancement in Mandarin
In this paper, we evaluate the performance of several state-of-the-art objective measures in terms of predicting speech intelligibility in Mandarin of the processed noisy signals by speech enhancement algorithms. The speech signals were first corrupted by three types of noises at two signal-to-noise ratios, followed by four classes of speech enhancement algorithms. The objective intelligibility...
متن کامل